Big Data Methods for Computational Linguistics

نویسندگان

Gerhard Weikum

Johannes Hoffart

Ndapandula Nakashole

Marc Spaniol

Fabian M. Suchanek

Mohamed Amir Yosef

چکیده

Many tasks in computational linguistics traditionally rely on hand-crafted or curated resources like thesauri or word-sense-annotated corpora. The availability of big data, from the Web and other sources, has changed this situation. Harnessing these assets requires scalable methods for data and text analytics. This paper gives an overview on our recent work that utilizes big data methods for enhancing semantics-centric tasks dealing with natural language texts. We demonstrate a virtuous cycle in harvesting knowledge from large data and text collections and leveraging this knowledge in order to improve the annotation and interpretation of language in Web pages and social media. Specifically, we show how to build large dictionaries of names and paraphrases for entities and relations, and how these help to disambiguate entity mentions in texts.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fuzzy TOPSIS Approach for Big Data Analytics Platform Selection

Big data sizes are constantly increasing. Big data analytics is where advanced analytic techniques are applied on big data sets. Analytics based on large data samples reveals and leverages business change. The popularity of big data analytics platforms, which are often available as open-source, has not remained unnoticed by big companies. Google uses MapReduce for PageRank and inverted indexes....

متن کامل

Toward Enhanced Metadata Quality of Large-Scale Digital Libraries: Estimating Volume Time Range

Metadata is a special type of data that describes data. In the age of Big Data, the role of metadata has become more prominent–it is obvious that big data needs high-quality metadata description as it becomes less and less possible for humans to go over all the data (if human readable) with the exponential growth of data sets. In this study we try to enhance metadata records (publication dates)...

متن کامل

CELEX: Building a Multifunctional, Polytheoretical Lexical Database

Recent developments in Computational Linguistics have brought about an increasing interest in large scale lexical modules, at a time when current trends in hardware and software engineering bring this goal within reach. This paper describes one such system, the C E L E X database. For expository purposes only, this system is contrasted with another big project that starts from different premiss...

متن کامل

Automatic Extraction of Causal Relations from Natural Language Texts: A Comprehensive Survey

Automatic extraction of cause-effect relationships from natural language texts is a challenging open problem in Artificial Intelligence. Most of the early attempts at its solution used manually constructed linguistic and syntactic rules on small and domain-specific data sets. However, with the advent of big data, the availability of affordable computing power and the recent popularization of ma...

متن کامل

Does a Computational Linguist have to be a Linguist?

Early computational linguists supplied much of theoretical basis that the ALPAC report said was needed for research on the practical problem of machine translation. The result of their efforts turned out to be more fundamental in that it provided a general theoretical basis for the study of language use as a process, giving rise eventually to constraint-based grammatical formalisms for syntax, ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IEEE Data Eng. Bull.

دوره 35 شماره

صفحات -

تاریخ انتشار 2012

Big Data Methods for Computational Linguistics

نویسندگان

چکیده

منابع مشابه

A Fuzzy TOPSIS Approach for Big Data Analytics Platform Selection

Toward Enhanced Metadata Quality of Large-Scale Digital Libraries: Estimating Volume Time Range

CELEX: Building a Multifunctional, Polytheoretical Lexical Database

Automatic Extraction of Causal Relations from Natural Language Texts: A Comprehensive Survey

Does a Computational Linguist have to be a Linguist?

عنوان ژورنال:

اشتراک گذاری